巴西专利BR112012016370B1 METHOD FOR ENCODING AN AUDIO SIGNAL

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
built-in speech and audio coding using a switchable model core a method for processing an audio signal including classifying an input frame or as a speech frame or a generic audio frame, producing an encoded bit stream and frame corresponding processed based on the input frame, produce an encoded bit stream of improvement layer based on a difference between the input frame and the processed frame, and multiplex the encoded bit stream of improvement layer, a codeword , and either an encoded speech bit stream or a generic audio encoded bit stream to a combined bit stream based on whether the codeword indicates that the input frame is classified as a speech frame or as a frame generic audio stream, where the encoded bit stream is either a speech encoded bit stream or a generic audio encoded bit stream.
公开号:BR112012016370B1
申请号:R112012016370-1
申请日:2010-11-29
公开日:2020-09-15
发明作者:James P. Ashley；Jonathan Alastair Gibbs；Udar Mittal
申请人:Google Technology Holdings LLC；
IPC主号:

专利说明:

FIELD OF DISSEMINATION
The present disclosure relates generally to audio and speech coding and, more particularly, to embedded audio and speech coding using a hybrid core codec with improved coding. FUNDAMENTALS
Speech encoders based on source-filter models are known to have quality problems processing generic audio input signals, such as music, tones, background noise, and even reverberating speech. Such codecs include Linear Predictive Coding (LPC) processors as well as Excited Linear Prediction can Code (CELP) encoders. Speech encoders tend to process low bit rates of speech signals. On the other hand, generic audio coding systems based on auditory models do not normally process speech signals very well for sensitivities to distortion in human speech along with bit rate limitations. One solution to this problem has been to provide a classifier to determine, on a frame-by-frame basis, whether an input signal is more or less like speech, and then select the appropriate encoder, that is, an encoder speech or generic audio, based on rating. An audio signal processor capable of processing different types of signals is sometimes referred to as a hybrid core codec.
An example of a practical system using a generic speech audio input discriminator is described in EVRC-WB (3GPP2 C.S0014-C). The problem with this approach is, as a practical matter, that it is often difficult to distinguish between generic and speech audio inputs, particularly where the input signal is close to the switching limit. For example, discrimination of signals that have a combination of speech and music or reverberating speech can cause frequent switching between generic and speech audio encoders, resulting in a processed signal having inconsistent sound quality.
Another solution to provide good generic audio and speech quality is to use an audio transform domain improvement layer on top of a speech encoder output. This method subtracts the speech encoder output signal from the input signal, and then transforms the resulting error signal into the frequency domain, where it is further encoded. This method is used in Recommendation ITU-T G.718. The problem with this solution is that when a generic audio signal is used as an input to the speech encoder, the output can be distorted, sometimes severely, and a substantial portion of the enhancement layer coding effort goes into reversing the effect of the noise produced by the signal model incompatibility, which leads to limited overall quality for a given bit rate.
The various aspects, characteristics and advantages of the invention will become more completely evident to those of ordinary skill in the art, taking careful consideration of the following detailed description of it with the accompanying drawings described below. The drawings may have been simplified for clarity and are not necessarily drawn to scale. BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a diagram of the audio signal encoding process.
Figure 2 is a schematic block diagram of a hybrid core codec suitable for processing generic audio and speech signals.
Figure 3 is a schematic block diagram of an alternative hybrid core codec suitable for processing generic and speech audio signals.
Figure 4 is a diagram of the audio signal decoding process.
Figure 5 is a decoder portion of a hybrid core codec. DETAILED DESCRIPTION
Disclosure is generally designed for methods and apparatus for processing audio signals and more particularly for processing audio signals arranged in a sequence, for example, a sequence of frames or subframes. The audio input signals comprising the frames are typically digitized. Signal units are generally classified, on a unit-by-unit basis, as being most suitable for one of at least two different encoding schemes. In one embodiment, the coded units or frames are combined with an error signal and an indication of the coding scheme for storage or communication. Disclosure is also designed for methods and apparatus for decoding the combination of the coded units and the error signal based on the coding scheme indication. These and other aspects of disclosure are discussed in more detail below.
In one embodiment, audio signals are classified as being more or less like speech, in which more frames with speech are processed with a codec more suitable for signals like speech, and the less frames like speech are processed with a codec more suitable for fewer signs as you speak. The present disclosure is not limited to the processing of audio signal frames classified as generic or speech audio signals. More generally, the disclosure is directed to the processing of audio signal frames with one of at least two different encoders without regard to the type of codec and without regard to the criteria used to determine which encoding scheme is applied to a particular framework.
In the present application, fewer signals such as speech are referred to as generic audio signals. Generic audio signals, however, are not necessarily speechless. Generic audio signals can include music, tones, background noise or combinations of them, alone or in association with some speech. A generic audio signal can also include reverberating speech. That is, a speech signal that has been damaged by large amounts of acoustic reflections (reverberation) may be more suitable for encoding by a generic audio encoder since the parameters of the model on which the speech encoding algorithm is based can have been compromised to some degree. In one embodiment, a frame classified as a generic audio frame includes no speech in the background, or speech in the background. In another embodiment, a generic audio frame includes a portion that is predominantly speechless and another, less prominent, portion that is predominantly speechless.
In process 100 of Figure 1, at 110, an input frame in a sequence of frames is classified as being one of at least two different pre-specified types of frames. In the exemplary implementation, an input audio signal comprises a sequence of frames which are each classified either as a speech frame or a generic audio frame. More generally, however, the input frames could be classified as one of at least two different types of audio frames. In other words, frames do not necessarily have to be distinguished based on whether they are speech frames or generic audio frames. In general, input frames can be evaluated to determine the best way to encode the frame. For example, a sequence of generic audio frames can be evaluated to determine the best way to encode the frames using one of at least two different codecs. The classification of audio frames is generally well known to those of ordinary skill in the art, and thus a more detailed discussion of the criteria and mechanism of discrimination is beyond the scope of instant disclosure. Classification can take place either before coding or after coding as will be discussed below.
Figure 2 illustrates a first schematic block diagram of an audio signal processor 200, which processes the frames of an input audio signal s (n), where "n" is an audio sample index. The audio signal processor comprises a mode selector 210 that classifies frames of the input audio signal s (n). Figure 3 also illustrates a schematic block diagram of another audio signal processor 300 comprising a mode selector 310 that classifies frames of an input audio signal s (n). The exemplary mode selectors determine whether frames of the incoming audio signal are more or less like speech. More generally, however, other criteria of the input audio frames can be assessed as a basis for mode selection. In both Figures 2 and 3, a mode selection code word is generated by the mode selector and supplied to a multiplexer 220 and 320, respectively. The code word may comprise one or more mode bits indicative of the mode of operation. In particular, the code word indicates, on a frame-by-frame basis, the way in which a corresponding frame of the input signal is processed. Thus, for example, the codeword indicates whether an input audio frame is processed as a speech signal, or as a generic audio signal.
In Figure 1, at 120, an encoded bit stream and a corresponding processed frame are produced based on a corresponding frame of the incoming audio signal. In Figure 2, the audio signal processor 200 comprises a speech encoder 230 and a generic audio encoder 240. The speech encoder is, for example, a code excited linear prediction encoder (CELP) or some other particularly suitable for encoding speech signals. The generic audio encoder is, for example, a Time Domain Tear Cancellation (TDAC) encoder, as a modified discrete cosine transform (MDCT) encoder. More generally, however, encoders 230 and 240 could be any different encoders. For example, encoders could be of different types of CELP class encoders optimized for different types of speech. The encoder can also be of different types of encoders of class TDAC or some other class of encoders. As suggested, each encoder produces an encoded bit stream based on the corresponding input audio frame processed by the encoder. Each encoder also produces a corresponding processed frame, which is a reconstruction of the input signal, indicated by Sc (n). The reconstructed signal is obtained by decoding the encoded bit stream. For convenience of illustration, the encoding and decoding features are represented by the unique functional block in the drawings, but the generation of the encoded bit stream can be represented by a coding block and the reconstituted input signal can be represented by a decoding block separate. Thus, the reconstructed frame is simultaneously subject to encoding and decoding.
In Figure 2, the first and second encoder 230 and 240 have inputs coupled to the input audio signal by a selection switch 250, which is controlled based on the mode selected or determined by mode selector 210. For example, switch 250 can be controlled by a processor based on the code word output of the mode selector. Switch 250 selects speech encoder 230 for processing speech frames and switch 250 selects the generic audio encoder for processing generic audio frames. In Figure 2, each frame is processed by only one encoder, for example, either the speech encoder or the generic audio encoder, by virtue of the selection switch 250. While only two encoders are illustrated in Figure 2, more generally, the frames can be processed by one of several different encoders. For example, one of three or more encoders can be selected to process a particular frame of the incoming audio signal. In other modalities, however, each frame is processed by all encoders as will be discussed below.
In Figure 2, a switch 252 at the output of encoders 230 and 240 couples the processed output of the selected encoder to multiplexer 220. More particularly, the switch couples the encoded bit stream output of the selected encoder to the multiplexer. Switch 252 is controlled based on the mode selected or determined by mode selector 210. For example, switch 252 can be controlled by a processor based on the code word output of mode 210. Multiplexer 220 multiplexes the word code with the encoded bitstream output of the corresponding encoder selected based on the codeword. Thus, for generic audio frames, switch 252 couples the output of generic audio encoder 240 to multiplexer 220, and for speech frames switch 252 couples the output of speech encoder 230 to the multiplexer.
In Figure 3, the input audio signal is applied directly to the first and second encoder 330 and 340, without using a selection switch, for example, switch 250 in Figure 2. In the processor in Figure 3, each frame of the signal Input audio is processed by all encoders, for example, speech encoder 330 and generic audio encoder 340. Generally, each encoder produces an encoded bit stream based on the corresponding input audio frame processed by the encoder. Each encoder also produces a corresponding processed frame by decoding the encoded bit stream, where the processed frame is a reconstruction of the input frame indicated by Sc (n). Generally, the incoming audio signal can be delayed by a delay entity, not shown, inherent in the first and / or second encoder. The input audio signal can also be subjected to filtering by a filtering entity, not shown, that precedes the first or second encoder. In one embodiment, the filtering entity performs re-sampling or rate conversion processing of the input signal.
For example, an incoming 8.16 or 32 kHz audio signal can be converted to a 12.8 kHz signal, which is typical of a speech signal. More generally, while only two encoders are illustrated in Figure 3, there can be multiple encoders.
In Figure 3, a switch 352 at the output of encoders 330 and 340 couples the output of the selected processed encoder to multiplexer 320. More particularly, the switch couples the encoded bit stream output from the encoder to the multiplexer. Switch 352 is controlled based on the mode selected or determined by mode selector 310. For example, switch 352 can be controlled by a processor based on the code word output of mode selector 310. Multiplexer 320 multiplexes the word code with the encoded bitstream output of the corresponding encoder selected based on the codeword. Thus, for generic audio frames, switch 352 couples the output of generic audio encoder 340 to multiplexer 320, and for speech frames switch 352 couples the output of speech encoder 330 to the multiplexer.
In Figure 1, at 130, an encoded bit stream of improvement layer is produced based on a difference between the input frame and a corresponding processed frame generated by the selected encoder. As noted, the processed frame is a reconstructed Sc (n) frame. In the processor of Figure 2, a difference signal is generated by a difference signal generator 260 based on a frame of the incoming audio signal and the corresponding processed frame returned by the encoder associated with the selected mode, as indicated by the word. code. A switch 254 at the output of the encoders 230 and 240 couples the output of the selected encoder to the difference signal generator 260. The difference signal is identified as an error signal E.
The difference signal is the input to an improvement layer encoder 270, which generates the improvement layer bit stream based on the difference signal. In the alternative processor of Figure 3, a difference signal is generated by a difference signal generator 360 based on a frame of the incoming audio signal and the corresponding processed frame returned by the corresponding encoder associated with the selected mode, as indicated by code word. A switch 354 at the output of encoders 330 and 340 couples the output of the selected encoder to the difference signal generator 360. The difference signal is the input to an enhancement layer encoder 370, which generates the bit stream of improvement based on the difference sign.
In some implementations, the frames of the input audio signal are processed before or after the generation of the difference signal. In one embodiment, the difference signal is weighted and transformed into the frequency domain, for example, using an MDCT, for processing by the improvement layer encoder. In the improvement layer, the error signal is composed of a weighted difference signal that becomes the domain MDCT (Modified Discrete Cosine Transform) for processing by an error signal encoder, for example, the improvement layer encoder in Figures 2 and 3. The error signal E is given as: E = MDCT {W (s - Sc)}, Eqn. (1) where W is a perceptual weighting matrix based on Linear Prediction (LP) filter coefficients A (z) from the core layer decoder, s is a vector (ie, a frame) of samples a from the input audio signal s (n) and sc is the corresponding sample vector of the core layer decoder.
In one embodiment, the enhancement layer encoder uses a similar method of encoding for frames processed by the speech encoder and for frames processed by the generic audio encoder. In the event that the input frame is classified as a speech frame that is encoded by a CELP encoder, the linear prediction filter coefficients (A (z)) generated by the CELP encoder are available for weighting the corresponding error signal based on the difference between the input frame and the processed frame sc (n) returned by the speech encoder (CELP). However, for the case where the input frame is classified as a generic audio frame encoded by a generic audio encoder using an MDCT-based encoding scheme, there are no LP filter coefficients available for weighting the error signal. . To resolve this situation, in one embodiment, the LP filter coefficients are first obtained by performing an LPC analysis on the processed frame sc (n) returning the generic audio encoder before generating the error signal in the signal generator difference. These resulting LPC coefficients are then used to generate the perceptual weighting matrix W applied to the error signal before encoding the improvement layer.
In another implementation, generating the error signal E includes modifying the sc (n) signal by pre-dimensioning. In a particular embodiment, a plurality of error values are generated based on signals that are scaled with different gain values, wherein the error signal having a relatively low value is used to generate the improvement layer bit stream. These and other aspects of error signal generation and processing are described more fully in US Publication corresponding to US Order No. 12/187423, entitled "Method and Apparatus for Generating an Improvement Layer within an Audio Coding System".
In Figure 1, at 140, the encoded bit stream of the enhancement layer, the codeword, and the encoded bit stream all based on a common frame of the input audio signal are multiplexed into a combined bit stream. For example, if the frame of the input audio signal is classified as a speech frame, the encoded bit stream is produced by the speech encoder, the enhancement layer bit stream is based on the processed frame produced by the speech encoder. speech, and the codeword indicates that the corresponding frame of the incoming audio signal is a speech frame. For the case where the frame of the input audio signal is classified as a generic audio frame, the encoded bit stream is produced by the generic audio encoder, the enhancement layer bit stream is based on the processed frame produced by the generic audio encoder, and the code word indicates that the corresponding frame of the incoming audio signal is a generic audio frame. Likewise, for any other encoder, the codeword indicates the classification of the input audio frame, and the encoded bit stream, and the processed frame are produced by the corresponding encoder.
In Figure 2, the codeword corresponding to the classification or mode selected by the mode selection entity 210 is sent to multiplexer 220. A second switch 252 at the output of encoders 230 and 240 couples the encoder corresponding to the mode selected for multiplexer 220 so that the corresponding encoded bit stream is communicated to the multiplexer. In particular, switch 252 couples the encoded bit stream output to either speech encoder 230 or generic audio encoder 240 to multiplexer 220. Switch 252 is controlled based on the selected mode or determined by mode selector 210. The switch 252 can be controlled by a processor based on the code word output of the mode selector. The enhancement layer bit stream is also communicated from the enhancement layer encoder 270 to multiplexer 220. The multiplexer combines the keyword, the selected encoder bit stream, and the enhancement layer bit stream. For example, in the case of a generic audio frame, switch 250 couples the input signal to generic audio encoder 240 and switch 252 couples the output of the generic audio encoder to multiplexer 220. Switch 254 couples the frame processed generated by the generic audio encoder for the difference signal generator, the output of which is used to generate the enhancement layer bit stream, which is multiplexed with the codeword and the encoded bit stream. Multiplexed information can be aggregated for each frame of the incoming audio signal and stored and / or communicated for further decoding. Decoding the combined information is discussed below.
In Figure 3, the code word corresponding to the classification or mode selected by the mode selection entity 310 is sent to multiplexer 320. A second switch 352 at the output of encoders 330 and 340 couples the encoder corresponding to the mode selected for multiplexer 320 so that the corresponding encoded bit stream is communicated to the multiplexer. In particular, switch 352 couples the encoded bit stream output to either speech encoder 330 or generic audio encoder 340 to multiplexer 320. Switch 352 is controlled based on the selected mode or determined by mode selector 310. The switch 352 can be controlled by a processor based on the code word output of the mode selector. The enhancement layer bit stream is also communicated from the enhancement layer encoder 370 to multiplexer 320. The multiplexer combines the keyword, the selected encoder bit stream, and the enhancement layer bit stream. For example, in the case of a speech frame, switch 352 couples the output from speech encoder 330 to multiplexer 320. Switch 354 couples the processed frame generated by the speech encoder to the difference signal generator 360, the output of which it is used to generate the enhancement layer bit stream, which is multiplexed with the codeword and the encoded bit stream. Multiplexed information can be aggregated for each frame of the incoming audio signal and stored and / or communicated for further decoding. Decoding the combined information is discussed below.
Generally, the incoming audio signal may be subject to delay, by a delay entity not shown, inherent to the first and / or second encoder. In particular, a delay element may be required along one or more of the processing paths to synchronize the combined information in the multiplexor. For example, generating the enhancement layer bit stream may require more processing time than generating one of the encoded bit streams. Thus, it may be necessary to delay the encoded bit stream in order to synchronize with the enhanced layer encoded bit stream. Communication of the codeword can also be delayed in order to synchronize the codeword with the encoded bit stream and the encoded enhancement layer. Alternatively, the multiplexer can store and maintain the codeword, and the encoded bit streams as they are generated and perform the multiplexing only after receiving all the elements to be combined.
The input audio signal may be subject to filtering, by a filtering entity not shown, that precedes the first or second encoder. In one embodiment, the filtering entity performs re-sampling or rate conversion processing of the input signal. For example, an incoming 8.16 or 32 kHz audio signal can be converted to a 12.8 KHz speech signal. More generally, the signal for all encoders can be subject to a conversion rate, either sampling above or sampling below. In modalities where one type of frame is subject to conversion rate and the other type of frame is not, it may be necessary to provide some delay in processing the frame that is not subject to conversion rate. One or more delay elements may also be desirable where conversion rates of different frame type introduce different amounts of delay.
In one embodiment, the input audio signal is classified as a speech signal or a generic audio signal based on corresponding sets of processed audio frames produced by the different audio encoders. In the exemplary mode of generic and speech audio signal processing, such an implementation suggests that the input frame is processed by both audio encoders, and the speech encoder before mode selection occurs or is determined. In Figure 3, the mode selection entity 310 classifies an input frame of the input audio signal either as a speech frame or a generic audio frame based on a processed speech frame generated by speech encoder 330 and based on in a processed generic audio frame generated by the generic audio encoder 340. In a more specific application, the input frame is classified based on a comparison of the first and second difference signals, in which the first difference signal is generated with based on the input frame and a processed speech frame and the second difference signal is generated based on the input frame and a processed generic audio frame. For example, a power characteristic of a first set of audio difference signal samples associated with the first difference signal can be compared to the power characteristic of a second set of audio difference signal samples associated with the second sign of difference. To implement this latter approach, the schematic block diagram in Figure 3 would require some modification to include the output of one or more difference signal generators to the 310 mode selection entity. These implementations are also applicable to modalities in which other types of encoders are employed.
In Figure 4, at 410, a combined bit stream is demultiplexed to an enhanced layer encoded bit stream, a codeword and an encoded bit stream. In Figure 5, a demultiplexer 510 performs the processing of the combined bit stream to produce the codeword, the enhancement layer bit stream and the encoded bit stream. The codeword indicates the selected mode and particularly the type of encoder used to encode the encoded bit stream. In the exemplary embodiment, the codeword indicates whether the encoded bit stream is a coded bit stream of speech or an encoded bit stream of generic audio. More generally, however, the codeword may be indicative of an encoder other than a generic speech or audio encoder. Some examples of alternative encoders are discussed above.
In Figure 5, a switch 512 selects a decoder to decode the encoded bit stream, based on the codeword. In particular, switch 512 selects a speech decoder 520 or generic audio decoder 530, thereby routing or coupling the encoded bit stream to the appropriate decoder. The encoded bit stream is processed by the appropriate decoder to produce the processed audio frame identified as s'c (n), which must be the same as the Sc (n) signal on the provided encoder side as long as there is no error in channel. In most practical implementations, the processed audio frame s'c (n) will be different than the corresponding frame of the input signal Sc (n). In some embodiments, a second switch 514 couples the output of the selected decoder to an addition entity 540, the function of which is discussed further below. The status of one or more switches is controlled based on the selected mode, as indicated by the code word, and can be controlled by a processor based on the code word returned from the demultiplexer.
In Figure 4, at 430, the encoded bitstream output of the enhancement layer is decoded into a decoded enhancement layer frame. In Figure 5, an enhancement layer decoder 550 decodes the encoded bitstream output from the enhancement layer from the de-multiplexer 510. The decoded error signal is indicated as E 'since the decoded error or difference is an approximation of the original error signal E. In Figure 4 at 440, the encoded bit stream of the decoded enhancement layer is combined with the decoded audio frame. In the decoding signal processor of Figure 5, the approximate error signal E 'is combined with the processed audio signal s'c (n) to reconstruct the corresponding estimate of the input frame s' (n). In modalities where the error signal is weighted, for example, by the weighting matrix in Equation (1) above, and where the encoded bit stream is a generic audio encoded bit stream, an inverse weighting matrix is applied to the weighted error signal before matching. These and other aspects of the reconstruction of the original input board, depending on the generation and processing of the error signal, are described more fully in Publication No US corresponding to Order No US 12/187423, entitled "Method and Apparatus for Generating an Improvement Layer within an Audio Coding System ".
Although the present disclosure, and the best ways of it, has been described in a way that establishes ownership and allowing those with current knowledge to make and use it, it will be understood and appreciated that there are equivalents to the exemplary modalities disclosed here and what modifications and variations can be made in it without departing from the scope and spirit of the invention, which must be limited not by the exemplary modalities, but by the attached claims. What is claimed is:

权利要求:
Claims (15)
[0001]
1. Method for encoding an audio signal, characterized by the fact that it comprises: classifying an input frame either as a speech frame or a generic audio frame, the input frame is based on the audio signal; producing an encoded bit stream and a corresponding processed frame based on the input frame; producing an encoded bit stream of improvement layer based on a difference between the input frame and the processed frame; and multiplexing the encoded bit stream of enhancement layer, a codeword, and either a coded bit stream of speech or an encoded bit stream of generic audio to a combined bit stream based on whether the codeword indicates that the input frame is classified as a speech frame or as a generic audio frame, wherein the encoded bit stream is either a speech encoded bit stream or a generic audio encoded bit stream.
[0002]
2. Method according to claim 1, characterized by the fact that it comprises producing at least one coded bit stream of speech and at least one corresponding processed speech frame based on the input frame when the input frame is classified as a speech frame, and produce at least one encoded bit stream of generic audio and at least one generic audio frame processed based on the input frame when the input frame is classified as a generic audio frame, multiplexing the audio stream Enhanced layer encoded bits, the encoded speech bit stream, and the codeword for the combined bit stream only when the input frame is classified as a speech frame, and multiplex the encoded layer bit stream. improvement, the encoded bit stream of generic audio, and the codeword for the combined bit stream only when the input frame is classified as a generic audio frame.
[0003]
3. Method according to claim 2, characterized by the fact that it comprises producing the encoded bit stream of improvement layer based on the difference between the input frame and the processed frame in which the processed frame is a processed frame of speaks when the input frame is classified as a speech frame, and where the processed frame is a generic audio processed frame when the input frame is classified as a generic audio frame.
[0004]
4. Method, according to claim 3, characterized by the fact that the processed frame is a generic audio frame, the method further comprising obtaining linear prediction filter coefficients by performing a linear prediction coding analysis of the frame processed from the generic audio encoder, weight the difference between the input frame and the processed frame from the generic audio encoder based on the linear prediction filter coefficients.
[0005]
5. Method according to claim 1, characterized by the fact that it comprises producing the encoded speech bit stream and a corresponding processed speech frame only when the input frame is classified as a speech frame, producing the flow of speech encoded bits of generic audio and a corresponding processed generic audio frame only when the input frame is classified as a generic audio frame, multiplex the encoded bit stream of enhancement, the encoded bit stream of speech, and the word code for the combined bit stream only when the input frame is classified as a speech frame, and multiplex the encoded bit stream for enhancement, the encoded bit stream for generic audio, and the codeword for the combined bitstream only when the input frame is classified as a generic audio frame.
[0006]
6. Method, according to claim 5, characterized by the fact that it comprises producing the encoded bit stream of improvement layer based on the difference between the input frame and the processed frame in which the processed frame is a processed frame of speaks when the input frame is classified as a speech frame, and where the processed frame is a generic audio processed frame when the input frame is classified as a generic audio frame.
[0007]
7. Method according to claim 6, characterized in that it comprises classifying the input frame before producing either the speech encoded bit stream or the generic audio encoded bit stream.
[0008]
8. Method, according to claim 6, characterized by the fact that the processed frame is a generic audio frame, the method further comprising obtaining linear prediction filter coefficients by performing a linear prediction encoding analysis of the frame processed from the generic audio encoder, weight the difference between the input frame and the processed frame from the generic audio encoder based on the linear prediction filter coefficients.
[0009]
9. Method, according to claim 1, characterized by the fact that it comprises producing the corresponding processed frame includes the production of a processed speech frame and production of a generic processed audio frame, classifying the input frame based on the frame processed speech and the generic audio frame processed.
[0010]
10. Method, according to claim 9, characterized by the fact that it comprises producing a first difference signal based on the input frame and the processed speech frame and producing a second difference signal based on the input frame and the generic processed audio frame, sort the input frame based on a comparison of the first difference and the second difference.
[0011]
11. Method according to claim 10, characterized by the fact that it comprises classifying the input signal either as a speech signal or a generic audio signal based on a comparison of an energy characteristic of a first set of samples of difference audio signal associated with the first difference signal and a second set of difference signal audio samples associated with the second difference signal.
[0012]
12. Method, according to claim 1, characterized by the fact that the processed frame is a generic audio frame, the method further comprising obtaining linear prediction filter coefficients by performing a linear prediction encoding analysis of the frame processed from the generic audio encoder, weight the difference between the input frame and the processed frame from the generic audio encoder based on the linear prediction filter coefficients, produce the encoded bit stream of improvement layer based on the weighted difference.
[0013]
13. Method for decoding an audio signal, characterized by the fact that it comprises: de-multiplexing a combined bit stream into an enhanced layer encoded bit stream, a codeword and an encoded bit stream, the code indicating whether the encoded bit stream is a speech encoded bit stream or a generic audio encoded bit stream; decoding the encoded improvement layer bit stream into a decoded improvement layer frame; decode the encoded bit stream into a decoded audio frame, where the encoded bit stream is decoded using a speech decoder or a generic audio decoder, depending on whether the codeword indicating the encoded bit stream is a stream of encoded speech bits or a generic audio encoded bit stream; and combining the decoded enhancement layer frame and the decoded audio frame.
[0014]
14. Method according to claim 13, characterized in that it comprises determining whether to decode the encoded bit stream using a speech decoder or a generic audio decoder based on whether the codeword indicates that the audio signal decoded is a speech signal or a generic audio signal.
[0015]
15. Method according to claim 13, characterized in that the decoded improvement layer frame is a weighted error signal and the encoded bit stream is a generic audio encoded bit stream, the method further comprising applying a inverse weighting matrix for the weighted error signal before combining.

类似技术:

公开号 | 公开日 | 专利标题

BR112012016370B1|2020-09-15|METHOD FOR ENCODING AN AUDIO SIGNAL

KR101139172B1|2012-04-26|Technique for encoding/decoding of codebook indices for quantized mdct spectrum in scalable speech and audio codecs

RU2496156C2|2013-10-20|Concealment of transmission error in digital audio signal in hierarchical decoding structure

AU2008316860B2|2011-06-16|Scalable speech and audio encoding using combinatorial encoding of MDCT spectrum

KR101455915B1|2014-11-03|Decoder for audio signal including generic audio and speech frames

Gunduzhan et al.2001|Linear prediction based packet loss concealment algorithm for PCM coded speech

JP2004508597A|2004-03-18|Simulation of suppression of transmission error in audio signal

JP5283046B2|2013-09-04|Selective scaling mask calculation based on peak detection

US9218817B2|2015-12-22|Low-delay sound-encoding alternating between predictive encoding and transform encoding

BR122019023704B1|2020-05-05|system for generating a high frequency component of an audio signal and method for performing high frequency reconstruction of a high frequency component

KR20100007738A|2010-01-22|Apparatus for encoding and decoding of integrated voice and music

US9489962B2|2016-11-08|Sound signal hybrid encoder, sound signal hybrid decoder, sound signal encoding method, and sound signal decoding method

BRPI0923850B1|2020-03-24|APPLIANCE THAT DECODES A MULTIPLE CHANNEL AUDIO SIGNAL AND METHOD FOR DECODING AND CODING A MULTIPLE CHANNEL AUDIO SIGNAL

JP5511848B2|2014-06-04|Speech coding apparatus and speech coding method

EP2815399A1|2014-12-24|A method and apparatus for performing an adaptive down- and up-mixing of a multi-channel audio signal

ES2725358T3|2019-09-23|Decoder to generate an improved audio signal in frequency, decoding procedure, encoder to generate an encoded signal and encoding procedure using lateral information of compact selection

Lu et al.2010|Dual-mode switching used for unified speech and audio codec

Tosun et al.2005|Dynamically adding redundancy for improved error concealment in packet voice coding

Tosun2004|Dynamically adding redundancy for improved error concealment in packet voice coding

BR112013020239A2|2020-11-24|noise generation in audio codecs

BR112013020239B1|2021-12-21|NOISE GENERATION IN AUDIO CODECS

Annadana et al.2006|A new low bit rate speech coding scheme for mixed content

同族专利:

公开号 | 公开日

US20110161087A1|2011-06-30|

EP2519945B1|2015-01-21|

CN102687200A|2012-09-19|

EP2519945A1|2012-11-07|

WO2011081751A1|2011-07-07|

US8442837B2|2013-05-14|

CN102687200B|2014-12-10|

KR20120109600A|2012-10-08|

BR112012016370A2|2018-05-15|

KR101380431B1|2014-04-01|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

GB9512284D0|1995-06-16|1995-08-16|Nokia Mobile Phones Ltd|Speech Synthesiser|

US6263312B1|1997-10-03|2001-07-17|Alaris, Inc.|Audio compression and decompression employing subband decomposition of residual signal and distortion reduction|

IL129752A|1999-05-04|2003-01-12|Eci Telecom Ltd|Telecommunication method and system for using same|

US6236960B1|1999-08-06|2001-05-22|Motorola, Inc.|Factorial packing method and apparatus for information coding|

JP3404024B2|2001-02-27|2003-05-06|三菱電機株式会社|Audio encoding method and audio encoding device|

US6658383B2|2001-06-26|2003-12-02|Microsoft Corporation|Method for coding speech and music signals|

US6950794B1|2001-11-20|2005-09-27|Cirrus Logic, Inc.|Feedforward prediction of scalefactors based on allowable distortion for noise shaping in psychoacoustic-based compression|

KR100711989B1|2002-03-12|2007-05-02|노키아 코포레이션|Efficient improvements in scalable audio coding|

JP3881943B2|2002-09-06|2007-02-14|松下電器産業株式会社|Acoustic encoding apparatus and acoustic encoding method|

US7876966B2|2003-03-11|2011-01-25|Spyder Navigations L.L.C.|Switching between coding schemes|

KR101000345B1|2003-04-30|2010-12-13|파나소닉 주식회사|Audio encoding device, audio decoding device, audio encoding method, and audio decoding method|

SE527670C2|2003-12-19|2006-05-09|Ericsson Telefon Ab L M|Natural fidelity optimized coding with variable frame length|

CA2566372A1|2004-05-17|2005-11-24|Nokia Corporation|Audio encoding with different coding models|

US7739120B2|2004-05-17|2010-06-15|Nokia Corporation|Selection of coding models for encoding an audio signal|

US20060047522A1|2004-08-26|2006-03-02|Nokia Corporation|Method, apparatus and computer program to provide predictor adaptation for advanced audio coding system|

WO2006030864A1|2004-09-17|2006-03-23|Matsushita Electric Industrial Co., Ltd.|Audio encoding apparatus, audio decoding apparatus, communication apparatus and audio encoding method|

US7461106B2|2006-09-12|2008-12-02|Motorola, Inc.|Apparatus and method for low complexity combinatorial coding of signals|

CN101145345B|2006-09-13|2011-02-09|华为技术有限公司|Audio frequency classification method|

CA2697604A1|2007-09-28|2009-04-02|Voiceage Corporation|Method and device for efficient quantization of transform information in an embedded speech and audio codec|

US8209190B2|2007-10-25|2012-06-26|Motorola Mobility, Inc.|Method and apparatus for generating an enhancement layer within an audio coding system|

CN101335000B|2008-03-26|2010-04-21|华为技术有限公司|Method and apparatus for encoding|

WO2009118044A1|2008-03-26|2009-10-01|Nokia Corporation|An audio signal classifier|

US8639519B2|2008-04-09|2014-01-28|Motorola Mobility Llc|Method and apparatus for selective signal coding based on core encoder performance|

CN101281749A|2008-05-22|2008-10-08|上海交通大学|Apparatus for encoding and decoding hierarchical voice and musical sound together|

EP2352147B9|2008-07-11|2014-04-23|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|An apparatus and a method for encoding an audio signal|

WO2010031003A1|2008-09-15|2010-03-18|Huawei Technologies Co., Ltd.|Adding second enhancement layer to celp based core layer|US7461106B2|2006-09-12|2008-12-02|Motorola, Inc.|Apparatus and method for low complexity combinatorial coding of signals|

US8576096B2|2007-10-11|2013-11-05|Motorola Mobility Llc|Apparatus and method for low complexity combinatorial coding of signals|

US20090234642A1|2008-03-13|2009-09-17|Motorola, Inc.|Method and Apparatus for Low Complexity Combinatorial Coding of Signals|

US8639519B2|2008-04-09|2014-01-28|Motorola Mobility Llc|Method and apparatus for selective signal coding based on core encoder performance|

KR20100006492A|2008-07-09|2010-01-19|삼성전자주식회사|Method and apparatus for deciding encoding mode|

US8175888B2|2008-12-29|2012-05-08|Motorola Mobility, Inc.|Enhanced layered gain factor balancing within a multiple-channel audio coding system|

US8219408B2|2008-12-29|2012-07-10|Motorola Mobility, Inc.|Audio signal decoder and method for producing a scaled reconstructed audio signal|

US8200496B2|2008-12-29|2012-06-12|Motorola Mobility, Inc.|Audio signal decoder and method for producing a scaled reconstructed audio signal|

US8423355B2|2010-03-05|2013-04-16|Motorola Mobility Llc|Encoder for audio signal including generic audio and speech frames|

US8428936B2|2010-03-05|2013-04-23|Motorola Mobility Llc|Decoder for audio signal including generic audio and speech frames|

US9129600B2|2012-09-26|2015-09-08|Google Technology Holdings LLC|Method and apparatus for encoding an audio signal|

CN103915097B|2013-01-04|2017-03-22|中国移动通信集团公司|Voice signal processing method, device and system|

MY177336A|2013-01-29|2020-09-12|Fraunhofer Ges Forschung|Concept for coding mode switching compensation|

KR101717006B1|2013-04-05|2017-03-15|돌비 인터네셔널 에이비|Audio processing system|

FR3024582A1|2014-07-29|2016-02-05|Orange|MANAGING FRAME LOSS IN A FD / LPD TRANSITION CONTEXT|

JP6384620B2|2015-09-15|2018-09-05|株式会社村田製作所|Contact detection device|

KR20200030912A|2018-09-13|2020-03-23|라인플러스 주식회사|Apparatus and method for providing call quality information|

法律状态:
2018-06-19| B25D| Requested change of name of applicant approved|Owner name: MOTOROLA MOBILITY LLC (US) |

2018-07-03| B25A| Requested transfer of rights approved|Owner name: GOOGLE TECHNOLOGY HOLDINGS LLC (US) |

2019-01-08| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-12-17| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2020-04-22| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2020-09-15| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 29/11/2010, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US12/650,970|US8442837B2|2009-12-31|2009-12-31|Embedded speech and audio coding using a switchable model core|

US12/650,970|2009-12-31|

PCT/US2010/058193|WO2011081751A1|2009-12-31|2010-11-29|Embedded speech and audio coding using a switchable model core|

[返回顶部]